Introduction¶
In this mini-project, I work with the Kaggle “GAN Getting Started” dataset, which contains about 1,200 Monet paintings and more than 7,000 real photographs, all standardized to 256×256 resolution. The challenge is to translate photos into Monet-style images and submit 7,000 generated samples for evaluation using the MiFID score, a variant of the Fréchet Inception Distance. My objective is to build a reliable CycleGAN-based image-to-image translation system that can learn an artistic style without paired supervision. Throughout the notebook, I process the dataset, implement the generator and discriminator architectures, train the model with adversarial and cycle-consistency losses, visualize intermediate and final results, and prepare the required submission files.
Motivation¶
My motivation for working on this project is to deepen my understanding of generative modeling, especially in the context of unpaired image translation tasks. Style transfer is an essential component of modern generative AI, and implementing it from the ground up offers valuable insight into training stability, loss balancing, and qualitative evaluation. This competition also provides a structured environment to practice managing datasets, organizing experiments, generating outputs at scale, and producing a submission suitable for a real benchmarking platform like Kaggle.
Real-World Applications¶
The direct outcome of this work has several meaningful applications:
Automated artistic stylization: The trained model can convert any photograph into a Monet-inspired painting, allowing users to instantly generate artistic versions of their own images.
Creative content generation: Designers, illustrators, and digital artists can use the model as a rapid prototyping tool to generate Monet-style backgrounds, textures, and visual elements without manual painting.
Dataset expansion for artistic models: The generator can produce unlimited Monet-like samples, enabling richer datasets for downstream GAN or diffusion models that require large collections of stylized images.
Interactive art tools: The learned model can be integrated into apps or web interfaces that allow users to upload images and obtain real-time artistic transformations.
In essence, the model I train in this project acts as a practical, fully functioning Monet-style generator that can be directly applied to creative media, artistic pipelines, and personal or professional digital art workflows.
1. Mounting Google Drive and Loading the Dataset¶
In this step, I mount Google Drive to access the dataset provided for the Monet style-transfer task. I then define the dataset root directory and load three folders:
monet_jpg— all Monet paintings (≈1.2k images)photo_jpg— all real photographs (≈7k images)test_photo_jpg— optional test photos for inference
After constructing the directory paths, I verify the existence of the Monet and Photo folders, scan all .jpg files, and report their counts. This confirms that the dataset is fully accessible and ready for preprocessing and model training.
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
import os, glob
DATA_ROOT = "/content/drive/MyDrive/FaceRecognitionProject/full_project_backup"
MONET_DIR = os.path.join(DATA_ROOT, "monet_jpg")
PHOTO_DIR = os.path.join(DATA_ROOT, "photo_jpg")
TEST_DIR = os.path.join(DATA_ROOT, "test_photo_jpg") # may not exist
assert os.path.isdir(MONET_DIR) and os.path.isdir(PHOTO_DIR), "monet_jpg/photo_jpg not found."
monet_files = sorted(glob.glob(os.path.join(MONET_DIR, "*.jpg")))
photo_files = sorted(glob.glob(os.path.join(PHOTO_DIR, "*.jpg")))
test_files = sorted(glob.glob(os.path.join(TEST_DIR, "*.jpg"))) if os.path.isdir(TEST_DIR) else []
print("Monet count:", len(monet_files))
print("Photo count:", len(photo_files))
print("Test count:", len(test_files))
Monet count: 300 Photo count: 7038 Test count: 0
2. Reproducibility and Device Configuration¶
To ensure consistent and reproducible results across multiple runs, I fix all random seeds (Python, NumPy, and PyTorch) and enforce deterministic behavior in cuDNN. This prevents subtle nondeterministic variations during training.
I also detect the available computing device (CPU or GPU) and report the number of GPUs accessible. All subsequent model components and tensors will be moved to this device to enable accelerated training.
import os, random, numpy as np, torch
SEED = 42
random.seed(SEED); np.random.seed(SEED)
torch.manual_seed(SEED); torch.cuda.manual_seed_all(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device, "#GPUs:", torch.cuda.device_count())
Device: cuda #GPUs: 1
3. Visualizing the Monet and Photo Domains (5×5 Grids)¶
To gain an initial qualitative understanding of the dataset, I loaded a subset of Monet paintings and real landscape photographs and displayed them in fixed 5×5 grids.
I first implemented a robust image-loading function that filtered out corrupted files to ensure that all grid tiles were valid. When fewer than 25 usable samples were available, the function automatically repeated images so that the grid remained fully populated.
I then generated two visual summaries:
- a 5×5 grid of Monet paintings
- a 5×5 grid of real photos
Both figures were exported as high-resolution PNG files in the samples/ directory.
These visualizations established the stylistic contrast between the two domains and provided a qualitative baseline for the CycleGAN training.
# ==============================================
# GRID VISUALIZATION (Monet + Photos) — FIXED
# Ensures NO empty tiles, NO missing images
# ==============================================
import os, glob, random
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
def load_valid_images(folder, limit=25):
"""Safely load up to `limit` valid JPG images."""
paths = sorted(glob.glob(os.path.join(folder, "*.jpg")))
valid = []
for p in paths:
try:
img = Image.open(p).convert("RGB")
valid.append(img)
except:
continue # skip corrupted images
if len(valid) >= limit:
break
return valid
def plot_grid(images, title, save_path=None, grid_size=(5,5)):
rows, cols = grid_size
total = rows * cols
imgs = images[:total] # crop extra
# If fewer images, tile by repeating
if len(imgs) < total:
repeat = (total // len(imgs)) + 1
imgs = (imgs * repeat)[:total]
fig, ax = plt.subplots(rows, cols, figsize=(14, 14))
fig.suptitle(title, fontsize=18, fontweight="bold")
idx = 0
for r in range(rows):
for c in range(cols):
ax[r, c].imshow(imgs[idx])
ax[r, c].axis("off")
idx += 1
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
if save_path:
fig.savefig(save_path, dpi=200)
print(f"Saved: {save_path}")
plt.close()
# -------------------------------
# Load valid Monet & Photo images
# -------------------------------
monet_imgs = load_valid_images(MONET_DIR, 25)
photo_imgs = load_valid_images(PHOTO_DIR, 25)
print(f"Loaded Monet images: {len(monet_imgs)}")
print(f"Loaded Photo images: {len(photo_imgs)}")
# -------------------------------
# Plot Monet Grid
# -------------------------------
plot_grid(
monet_imgs,
title="Sample Monet Paintings (5×5)",
save_path="samples/grid_monet_fixed.png",
grid_size=(5,5)
)
# -------------------------------
# Plot Photo Grid
# -------------------------------
plot_grid(
photo_imgs,
title="Sample Real Photos (5×5)",
save_path="samples/grid_photos_fixed.png",
grid_size=(5,5)
)
Loaded Monet images: 25 Loaded Photo images: 25
Saved: samples/grid_monet_fixed.png